Brain decoding with SVM

Support vector machines

_images/optimal-hyperplane.png

Fig. 4 A SVM aims at finding an optimal hyperplane to separate two classes in high-dimensional space, while maximizing the margin. Image from the scikit-learn SVM documentation under BSD 3-Clause license.

We are going to train a support vector machine (SVM) classifier for brain decoding on the Haxby dataset. SVM is often successful in high dimensional spaces, and it is a popular technique in neuroimaging.

In the SVM algorithm, we plot each data item as a point in N-dimensional space that N depends on the number of features that distinctly classify the data points (e.g. when the number of features is 3 the hyperplane becomes a two-dimensional plane.). The objective here is finding a hyperplane (decision boundaries that help classify the data points) with the maximum margin (i.e the maximum distance between data points of both classes). Data points falling on either side of the hyperplane can be attributed to different classes.

The scikit-learn documentation contains a detailed description of different variants of SVM, as well as example of applications with simple datasets.

Getting the data

We are going to download the dataset from Haxby and colleagues (2001) [HGF+01]. You can check section An overview of the Haxby dataset for more details on that dataset. Here we are going to quickly download it, and prepare it for machine learning applications with a set of predictive variable, the brain time series X, and a dependent variable, the annotation on cognition y.

import os
import warnings
warnings.filterwarnings(action='once')

from nilearn import datasets
# We are fetching the data for subject 4
data_dir = os.path.join('..', 'data')
sub_no = 4
haxby_dataset = datasets.fetch_haxby(subjects=[sub_no], fetch_stimuli=True, data_dir=data_dir)
func_file = haxby_dataset.func[0]

# mask the data
from nilearn.input_data import NiftiMasker
mask_filename = haxby_dataset.mask_vt[0]
masker = NiftiMasker(mask_img=mask_filename, standardize=True, detrend=True)
X = masker.fit_transform(func_file)

# cognitive annotations
import pandas as pd
behavioral = pd.read_csv(haxby_dataset.session_target[0], delimiter=' ')
y = behavioral['labels']
/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/sklearn/utils/multiclass.py:14: DeprecationWarning: Please use `spmatrix` from the `scipy.sparse` namespace, the `scipy.sparse.base` namespace is deprecated.
  from scipy.sparse.base import spmatrix
/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/sklearn/utils/optimize.py:18: DeprecationWarning: Please use `line_search_wolfe2` from the `scipy.optimize` namespace, the `scipy.optimize.linesearch` namespace is deprecated.
  from scipy.optimize.linesearch import line_search_wolfe2, line_search_wolfe1
/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/sklearn/utils/optimize.py:18: DeprecationWarning: Please use `line_search_wolfe1` from the `scipy.optimize` namespace, the `scipy.optimize.linesearch` namespace is deprecated.
  from scipy.optimize.linesearch import line_search_wolfe2, line_search_wolfe1
/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/nilearn/datasets/func.py:20: DeprecationWarning: Please use `MatReadError` from the `scipy.io.matlab` namespace, the `scipy.io.matlab.miobase` namespace is deprecated.
  from scipy.io.matlab.miobase import MatReadError
/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/nilearn/datasets/__init__.py:93: FutureWarning: Fetchers from the nilearn.datasets module will be updated in version 0.9 to return python strings instead of bytes and Pandas dataframes instead of Numpy arrays.
  warn("Fetchers from the nilearn.datasets module will be "

Let’s check the size of X and y:

categories = y.unique()
print(categories)
print(y.shape)
print(X.shape)
['rest' 'face' 'chair' 'scissors' 'shoe' 'scrambledpix' 'house' 'cat'
 'bottle']
(1452,)
(1452, 675)

So we have 1452 time points, with one cognitive annotations each, and for each time point we have recordings of fMRI activity across 675 voxels. We can also see that the cognitive annotations span 9 different categories.

Training a model

We are going to start by splitting our dataset between train and test. We will keep 20% of the time points as test, and then set up a 10 fold cross validation for training/validation.

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)   

Now we can initialize a SVM classifier, and train it:

from sklearn.svm import SVC
model_svm = SVC(random_state=0, kernel='linear', C=1)
model_svm.fit(X_train, y_train)
SVC(C=1, kernel='linear', random_state=0)

Assessing performance

Let’s check the accuracy of the prediction on the training set:

from sklearn.metrics import classification_report
y_train_pred = model_svm.predict(X_train)
print(classification_report(y_train, y_train_pred))
              precision    recall  f1-score   support

      bottle       1.00      1.00      1.00        85
         cat       1.00      1.00      1.00        88
       chair       1.00      1.00      1.00        90
        face       1.00      1.00      1.00        81
       house       1.00      1.00      1.00        91
        rest       1.00      1.00      1.00       471
    scissors       1.00      1.00      1.00        81
scrambledpix       1.00      1.00      1.00        90
        shoe       1.00      1.00      1.00        84

    accuracy                           1.00      1161
   macro avg       1.00      1.00      1.00      1161
weighted avg       1.00      1.00      1.00      1161

This is dangerously high. Let’s check on the test set:

y_test_pred = model_svm.predict(X_test)
print(classification_report(y_test, y_test_pred))
              precision    recall  f1-score   support

      bottle       0.72      0.78      0.75        23
         cat       0.67      0.70      0.68        20
       chair       0.74      0.78      0.76        18
        face       0.89      0.93      0.91        27
       house       0.93      0.82      0.87        17
        rest       0.91      0.89      0.90       117
    scissors       0.83      0.74      0.78        27
scrambledpix       0.85      0.94      0.89        18
        shoe       0.72      0.75      0.73        24

    accuracy                           0.84       291
   macro avg       0.81      0.81      0.81       291
weighted avg       0.84      0.84      0.84       291

We can have a look at the confusion matrix:

# confusion matrix
import sys
import numpy as np
from sklearn.metrics import confusion_matrix
sys.path.append('../src')
import visualization
cm_svm = confusion_matrix(y_test, y_test_pred)
model_conf_matrix = cm_svm.astype('float') / cm_svm.sum(axis=1)[:, np.newaxis]

visualization.conf_matrix(model_conf_matrix,
                          categories,
                          title='SVM decoding results on Haxby')
_images/svm_decoding_13_0.png

Visualizing the weights

Finally we can visualize the weights of the (linear) classifier to see which brain region seem to impact most the decision, for example for faces:

from nilearn import plotting
# first row of coef_ is comparing the first pair of class labels
# with 9 classes, there are 9 * 8 / 2 distinct
coef_img = masker.inverse_transform(model_svm.coef_[0, :])
plotting.view_img(
    coef_img, bg_img=haxby_dataset.anat[0],
    title="SVM weights", dim=-1, resampling_interpolation='nearest'
)
/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/scipy/ndimage/_measurements.py:305: DeprecationWarning: In future, it will be an error for 'np.bool_' scalars to be interpreted as an index
  return _nd_image.find_objects(input, max_label)

And now the easy way

We can use the high-level Decoder object from Nilearn. See Decoder object for details. It reduces model specification and fit to two lines of code:

from nilearn.decoding import Decoder
# Specify the classifier to the decoder object.
# With the decoder we can input the masker directly.
# We are using the svc_l1 here because it is intra subject.
#
# cv=5 means that we use 5-fold cross-validation
#
# As a scoring scheme, one can use f1, accuracy or ROC-AUC
#
decoder = Decoder(estimator='svc', cv=5, mask=mask_filename, scoring='f1') 
decoder.fit(func_file, y)

That’s it ! We can now look at the results: F1 score and coefficient image:

print('F1 scores')
for category in categories:
    print(category, '\t\t    {:.2f}'.format(np.mean(decoder.cv_scores_[category])))
plotting.view_img(
    decoder.coef_img_['face'], bg_img=haxby_dataset.anat[0],
    title="SVM weights for face", dim=-1, resampling_interpolation='nearest'
)
F1 scores
rest 		    0.80
face 		    0.30
chair 		    0.27
scissors 		    0.25
shoe 		    0.23
scrambledpix 		    0.31
house 		    0.29
cat 		    0.22
bottle 		    0.19
/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/scipy/ndimage/_measurements.py:305: DeprecationWarning: In future, it will be an error for 'np.bool_' scalars to be interpreted as an index
  return _nd_image.find_objects(input, max_label)

Note: the Decoder implements a one-vs-all strategy. Note that this is a better choice in general than one-vs-one.

Getting more meaningful weight maps with Frem

It is often tempting to interpret regions with high weights as ‘important’ for the prediction task. However, there is no statistical guarantee on these maps. Moreover, they iften do not even exhibit very clear structure. To improve that, a regularization can be brought by using the so-called Fast Regularized Ensembles of models (FREM), that rely on simple averaging and clustering tools to provide smoother maps, yet with minimal computational overhead.

from nilearn.decoding import FREMClassifier
frem = FREMClassifier(estimator='svc', cv=5, mask=mask_filename, scoring='f1')
frem.fit(func_file, y)
plotting.view_img(
    frem.coef_img_['face'], bg_img=haxby_dataset.anat[0],
    title="SVM weights for face", dim=-1, resampling_interpolation='nearest'
)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[11], line 3
      1 from nilearn.decoding import FREMClassifier
      2 frem = FREMClassifier(estimator='svc', cv=5, mask=mask_filename, scoring='f1')
----> 3 frem.fit(func_file, y)
      4 plotting.view_img(
      5     frem.coef_img_['face'], bg_img=haxby_dataset.anat[0],
      6     title="SVM weights for face", dim=-1, resampling_interpolation='nearest'
      7 )

File /opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/nilearn/decoding/decoder.py:526, in _BaseDecoder.fit(self, X, y, groups)
    518     warnings.warn(
    519         "After clustering and screening, the decoding model will "
    520         "be trained only on {} features. ".format(n_final_features)
    521         + "Consider raising clustering_percentile or "
    522         + "screening_percentile parameters", UserWarning)
    524 parallel = Parallel(n_jobs=self.n_jobs, verbose=2 * self.verbose)
--> 526 parallel_fit_outputs = parallel(
    527     delayed(self._cache(_parallel_fit))(
    528         estimator=self.estimator,
    529         X=X, y=y[:, c], train=train, test=test,
    530         param_grid=self.param_grid,
    531         is_classification=self.is_classification, selector=selector,
    532         scorer=self.scorer_, mask_img=self.mask_img_, class_index=c,
    533         clustering_percentile=self.clustering_percentile)
    534     for c, (train, test) in itertools.product(
    535         range(n_problems), self.cv_))
    537 coefs, intercepts = self._fetch_parallel_fit_outputs(
    538     parallel_fit_outputs, y, n_problems)
    540 # Build the final model (the aggregated one)

File /opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/joblib/parallel.py:1085, in Parallel.__call__(self, iterable)
   1076 try:
   1077     # Only set self._iterating to True if at least a batch
   1078     # was dispatched. In particular this covers the edge
   (...)
   1082     # was very quick and its callback already dispatched all the
   1083     # remaining jobs.
   1084     self._iterating = False
-> 1085     if self.dispatch_one_batch(iterator):
   1086         self._iterating = self._original_iterator is not None
   1088     while self.dispatch_one_batch(iterator):

File /opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/joblib/parallel.py:901, in Parallel.dispatch_one_batch(self, iterator)
    899     return False
    900 else:
--> 901     self._dispatch(tasks)
    902     return True

File /opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/joblib/parallel.py:819, in Parallel._dispatch(self, batch)
    817 with self._lock:
    818     job_idx = len(self._jobs)
--> 819     job = self._backend.apply_async(batch, callback=cb)
    820     # A job can complete so quickly than its callback is
    821     # called before we get here, causing self._jobs to
    822     # grow. To ensure correct results ordering, .insert is
    823     # used (rather than .append) in the following line
    824     self._jobs.insert(job_idx, job)

File /opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/joblib/_parallel_backends.py:208, in SequentialBackend.apply_async(self, func, callback)
    206 def apply_async(self, func, callback=None):
    207     """Schedule a func to be run"""
--> 208     result = ImmediateResult(func)
    209     if callback:
    210         callback(result)

File /opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/joblib/_parallel_backends.py:597, in ImmediateResult.__init__(self, batch)
    594 def __init__(self, batch):
    595     # Don't delay the application, to avoid keeping the input
    596     # arguments in memory
--> 597     self.results = batch()

File /opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/joblib/parallel.py:288, in BatchedCalls.__call__(self)
    284 def __call__(self):
    285     # Set the default nested backend to self._backend but do not set the
    286     # change the default number of processes to -1
    287     with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 288         return [func(*args, **kwargs)
    289                 for func, args, kwargs in self.items]

File /opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/joblib/parallel.py:288, in <listcomp>(.0)
    284 def __call__(self):
    285     # Set the default nested backend to self._backend but do not set the
    286     # change the default number of processes to -1
    287     with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 288         return [func(*args, **kwargs)
    289                 for func, args, kwargs in self.items]

File /opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/joblib/memory.py:349, in NotMemorizedFunc.__call__(self, *args, **kwargs)
    348 def __call__(self, *args, **kwargs):
--> 349     return self.func(*args, **kwargs)

File /opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/nilearn/decoding/decoder.py:175, in _parallel_fit(estimator, X, y, train, test, param_grid, is_classification, selector, scorer, mask_img, class_index, clustering_percentile)
    172     n_clusters = int(X_train.shape[1] * clustering_percentile / 100.)
    173     clustering = ReNA(mask_img, n_clusters=n_clusters, n_iter=20,
    174                       threshold=1e-7, scaling=False)
--> 175     X_train = clustering.fit_transform(X_train)
    176     X_test = clustering.transform(X_test)
    178 do_screening = (X_train.shape[1] > 100) and selector is not None

File /opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/sklearn/base.py:699, in TransformerMixin.fit_transform(self, X, y, **fit_params)
    695 # non-optimized default implementation; override when a better
    696 # method is possible for a given clustering algorithm
    697 if y is None:
    698     # fit method of arity 1 (unsupervised transformation)
--> 699     return self.fit(X, **fit_params).transform(X)
    700 else:
    701     # fit method of arity 2 (supervised transformation)
    702     return self.fit(X, y, **fit_params).transform(X)

File /opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/nilearn/regions/rena_clustering.py:496, in ReNA.fit(self, X, y)
    491     raise ValueError("The mask image should be a Niimg-like"
    492                      "object. Instead a %s object was provided."
    493                      % type(self.mask_img))
    495 if self.memory is None or isinstance(self.memory, str):
--> 496     self.memory_ = Memory(cachedir=self.memory,
    497                           verbose=max(0, self.verbose - 1))
    498 else:
    499     self.memory_ = self.memory

TypeError: __init__() got an unexpected keyword argument 'cachedir'

Note that the resulting accuracy is in general slightly higher:

print('F1 scoreswith FREM')
for category in categories:
    print(category, '\t\t    {:.2f}'.format(np.mean(decoder.cv_scores_[category])))

Exercises

  • What is the most difficult category to decode? Why?

  • The model seemed to overfit. Can you find a parameter value for C in SVC such that the model does not overfit as much?

  • Try a 'rbf' kernel in SVC. Can you get a better test accuracy than with the 'linear' kernel?

  • Try to explore the weights associated with other labels.

  • Instead of doing a 5-fold cross-validation, on should split the data by runs. Implement a leave-one-run and leave-two-run out cross-validation. For that you will need to access the run information, that is stored in behavioral[chunks]. You will also need the LeavePGroupOut object of scikit-learn.

  • Try implementing a random forest or k nearest neighbor classifier.

  • Hard: implement a systematic hyper-parameter optimization using nested cross-validation. Tip: check this scikit-learn tutorial.

  • Hard: try to account for class imbalance in the dataset.